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Summary. In this paper we present a locally and dimension-adaptive sparse grid method for 
£f~^ interpolation and integration of high-dimensional functions with discontinuities. The proposed 

algorithm combines the strengths of the generalised sparse grid algorithm and hierarchical 
surplus-guided local adaptivity. A high-degree basis is used to obtain a high-order method 
which, given sufficient smoothness, performs significantly better than the piecewise-linear 
basis. The underlying generalised sparse grid algorithm greedily selects the dimensions and 
variable interactions that contribute most to the variability of a function. The hierarchical 
surplus of points within the sparse grid is used as an error criterion for local refinement with 
the aim of concentrating computational effort within rapidly varying or discontinuous regions. 
This approach limits the number of points that are invested in 'unimportant' dimensions and 

1 i regions within the high-dimensional domain. We show the utility of the proposed method for 

non-smooth functions with hundreds of variables. 

> 

2 1 Introduction 

O 

f^**) The need for interpolation and integration of high-dimensional functions arises in 

f-^ many fields including finance, physics, chemistry, and uncertainty quantification. 

t— I Sparse grids have emerged as an extremely useful tool to construct such multi- 

I dimensional approximations. They have been extensively used for high-dimensional 

interpolation [HE) an d quadrature [2, 6| and have been shown, under certain condi- 
tions, to obtain significantly higher rates of convergence than many existing methods. 
For example, the complexity of the Monte Carlo method with n samples is 0(rT 1 / 2 ), 
whereas the complexity of the sparse grid method lfl3l is 0(n~ r (\ogn)( d ~' L ^ r+ ^) 
when used to approximate li-dimensional integrands which have bounded mixed par- 
tial derivatives of order r. Sparse grids achieve faster rates of convergence by taking 
advantage of higher smoothness and lower-effective dimensionality of the integrand. 

Standard sparse grids are isotropic, treating all dimensions equally. Although an 
advance on alternative methods, such approximations can still be improved. Many 
problems vary rapidly in only some dimensions, remaining less variable in other 
dimensions. Consequently it is advantageous to increase the level of accuracy only 
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in certain highly varying dimensions, resulting in so-called adaptive or anisotropic 
grids. In some cases the important dimensions can be determined a priori, but in most 
cases the grid points must be chosen during the computational procedure. 

Gerstner and Griebel [6] developed a dimension-adaptive tensor-product quadra- 
ture method to approximate high-dimensional functions by a sum of lower-dimensional 
terms. The method is based upon a generalisation of the traditional isotropic sparse 
grid index set that, given appropriate error estimators, can automatically concentrate 
computational effort in important dimensions. Recently Griebel and Hotlz |8 1 devel- 
oped a new general class of dimension-adaptive quadrature methods. The quadra- 
ture schemes detect and exploit the low effect dimensionality of a function. This is 
achieved by truncation and discretization of the anchored- ANOVA decomposition of 
the function being approximated. This method has been used successfully to estimate 
high-dimensional integrals arising in finance. 

In addition to dimension based adaptivity, efficiency in approximation of high- 
dimensional functions can also be obtained through local adaptation. Locally adap- 
tive sparse grids were first use by Griebel [7 | to solve the solution of partial differen- 
tial equation and have also been used used to quantify uncertainty in mathematical 
models |9], interpolate and integrate functions lfT2ll and scattered data approximation 
problem ATI . The refinement of the sparse grid is guided by the magnitude of the 
so-called hierarchical surplus, which is the difference between the true function and 
the approximation at a new grid point before that point is used in the interpolation. 
When a grid point is identified for refinement, new points are invested locally around 
that point in every dimension. These new points are subsequently refined. The re- 
sulting method automatically concentrates function evaluations in rapidly varying or 
discontinuous regions. 

The aforementioned locally adaptive sparse grid methods are implicitly dimension- 
adaptive, but often points are constructed unnecessarily in 'unimportant' dimensions. 
In comparison, the generalised sparse grid algorithm [6 1 provides an efficient means 
of identifying the effective dimensionality of a problem and restricting function eval- 
uations to that sub-dimensional space. This method performs extremely well when 
the solution is smooth. However the efficiency of the generalised sparse grid method 
can be significantly improved when only small regions of the input space contribute 
to the model's variability. 

In this paper we combine local refinement with the dimension-adaptive algorithm 
of the generalised sparse grid method to construct an efficient high-dimensional in- 
terpolation method. Furthermore we utilise the localised polynomial basis proposed 
by Bungartz J3] to create a higher-order method which achieves fast rates of con- 
vergence in smooth regions and accuracy, comparable to linear methods, around 
discontinuities. We coin this approach the /i-Adaptive Generalised Sparse Grid (/;- 
GSG) method. The convergence of the proposed method is analysed, with respect to 
the order of the local polynomial basis and the dimensionality of the input space. 
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2 Adaptive sparse grids 

In this paper we will attempt to interpolate and integrate functions / : £2 —> M defined 
on a (i-dimensional bounded domain £2 . We need not know the closed form of /, we 
only require that the function / can be evaluated at arbitrary points in £2 using a 
numerical code. 



2.1 Interpolation 

To construct an interpolant off, we must first discretize £2 . Without loss of generality 
let us consider functions defined on the of-dimensional unit hypercube £2 = [0, l] d . 
Sparse grids are a direct sum of anisotropic grids £2\ on the domain £2 where i = 
(z'l, . . . , id) & N rf is a multi-index denoting the level of refinement of the grid in each 
dimension d. Each grid £2y is a tensor product of one-dimensional grids 

£2i = (x ii i,...,Xi : m i ) (1) 

where w, is odd and represents the number of points xu in the /th level one dimen- 
sional grid. Specifically 

d 

A = (g)A„ 

n=\ 

which consists of the points Xj j = (jtjj j, , . . . ,Xj d j.) and where i indicates the level of 
refinement and j denotes the location of a given grid point. The exact coordinates of 
each point and the total number of points m,-, x • • • x m,-. is dependent on the type of 
one-dimensional grids used. 

Each grid £2[ is associated with a discrete approximation space V; and a set of 
basis functions that span the discrete space. The types of basis functions that can 
be used are dependent on the type of one-dimensional grids employed. The most 
frequently used and simplest choice are the multi-linear piecewise basis functions H 
|71[9], based upon the one-dimensional formula 



centered at the points 



1 - (m; - 1) \x - Xjj | if \x - Xij | < hi 
otherwise 



jy.li i>\ and < j < mi 
0.5 i = 1 



where h t — 1 / (m; — 1 ) . 

These ID basis functions can be used to form a set of af-dimensional basis func- 
tions 

n=l 
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which span the discrete space Vj. Specifically 

Vi = span {f<j \j„ =1,.. .,m; n , n = \,...,d] (2) 
The spaces Vj can be used to define hierarchical difference spaces W\ 

d 

Wi = Vi\0Vi-e„ 0) 

«=1 

These spaces consist of all the basis functions j e Vj with associated points Xj j that 
are not associated with any of the basis functions in spaces smaller than V;. A discrete 
space Vu is smaller than a space V; if k < i. Setting Vj = and using |2) and <(3j we 
obtain 

W i = span{fi.j|jeB i } 

where 

Bi = {j : j„ = l,...,m,„, y'oddn = l,...,c/} (4) 

These hierarchical difference spaces can be used to decompose the input space Q. = 
V such that 

A-,=0 k d =0 keR^ 

For numerical purposes we must truncate the number of difference spaces used to 
construct V to some level I. Specifically the classical finite dimensional sparse grid 
space is defined by 

y$t = w (5) 

i«ii<i 

With such a decomposition any function /(x) e V can be approximated by 

/uW= E Lv,jf{j(x) (6) 

|i|l</jeBj 

where v\ j € K are the coefficient values of the hierarchical product basis, also known 
as the hierarchical surplus. 

The size of the sparse grid space is \V^\ = ff(hj l ■ \\og 2 hi\ d - 1 ) = 0{2 l ■ l*' 1 ) 
which is a significant reduction on the ff(2 l ' d ) number of points required by the full 
tensor product space vfj obtained by choosing = maxo<„< f / i„ <l. 

2.2 Quadrature 

The extension from interpolation to quadrature is straightforward. We can approxi- 
mate the integral of a function / 



'[/«]= / /«<*M(x) 
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using the hierarchical sparse grid interpolant |6]). Utilizing this formulation these 
integrals can be approximated by 



can be calculated easily and with no need for extra function evaluations once the 
interpolant has been constructed. One simply needs to store the volumes of the high- 
order basis functions. For dfl = dx these volumes can be calculated analytically. 

2.3 A Local High-Order Basis 

Sparse grids are not restricted to piecewise multi-linear basis functions that are con- 
structed on equidistant grids. Various formulations exist. In the following we propose 
a high order local basis for interpolation and quadrature. This basis was first proposed 
by Bungartz [3 1 for the solution of partial differential equations. The local nature of 
the basis functions allows for local adaptivity and restricts the effects of Gibbs type 
phenomenon experienced by global polynomial approximation whilst still achieving 
polynomial convergence in smooth regions. 

As with the linear case, we restrict our attention to grids Qj with mesh spacing 
that is equidistant with respect to each individual dimension but may vary between 
dimensions. Let ^fy denote a one-dimensional polynomial of degree p defined on 
the interval [xu — hi,xu +hi\. This localized support is essential for application of 
the high-order basis to non-smooth problems and the ultimate goal of an adaptive 
method. Uniquely defining this basis requires p + 1 conditions that ¥Jy must satisfy. 

Here we take advantage of the fact that each point x, j has an ancestry. The one- 
dimensional equidistant points xij can be considered as a tree-like data structure. 
The coordinate of each point is defined uniquely by the level i and the position j. 
With this observation we can define a local p-th order polynomial using the points 
Xij — hi, Xjj, Xi j + hi and the next p — 2 closest hierarchical ancestors of Xjj. 

Definition 1. Given the one-dimensional grid 12, with grid points defined according 
to \2.\\ the p-th order basis function 9Jy is the hierarchical interpolant of the point 
Xij — hi, Xij, Xij + hj and the next p — 2 closest hierarchical ancestors ofxi j restricted 
to the local support [xtj — hi,Xij + hj]. Specifically by renaming all the points except 
Xij in ascending order as xq, . . . ,x p the piecewise Lagrange basis can be written 




</jeffi 



where the weights 
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The order p of the basis function is dependent on the hierarchical level i of Xu. 
For p > 2, p + 1 ancestors are needed to construct the basis (f£ ,-. On level one, only 
one ancestor (x = 0.5, on level zero) is available and thus only linear basis functions 
can be used. On level two, only two ancestors exist, and thus only linear or quadratic 
basis functions can be used. Subsequently basis functions of degree p can only be 
used when i > p. This represents a slight modification of the approach employed 
by Bungartz [3] who allowed basis functions of degree p to be used when i > p — 
1. Bungartz approach was designed for sparse grids with homogeneous boundary 
conditions. 

Throughout the remainder of this manuscript we restrict our attention to basis 
functions with fixed maximum degree p max . That is, the order of the basis is increased 
with each level of the sparse grid until the order of the basis is p max — 1 . On all 
subsequent levels the order of the basis is kept constant at p max - The tensor product 
construction of the multi-dimensional basis means that the degree p = (/?,-, . . . ,pj) 
of a ^-dimensional basis function f{ j must satisfy 

<p n =min{p max ,/„}, /„>0 ,n=l,...,d 

Here p n —0 represents the constant function centred at the midpoint of [0, 1]. 

2.4 Adaptivity 

The classical sparse grids presented in Section [2T| are based upon the index set 

,y = {ieN d : </} 

This construction delays the curse of dimensionality by assuming that the importance 
of any interaction between a subset of a function's variables decreases as the num- 
ber of variables involved in the interaction (interaction order) increases. Although an 
advance on full tensor product spaces, such approximations can still be improved. 
The classical sparse grid construction treats all dimensions equally and all interac- 
tions of the same order equally. In practice, often only a small subset of variables 
and interactions contributes significantly to the variability of the function /. More- 
over, frequently only small regions within the input space possess high variability. In 
some cases the important dimensions, interactions and regions can be determined a 
priori, but in most cases these properties must be identified during the computational 
procedure. 

The generalised sparse grid method [6| is extremely effective at determining the 
dimensions and interactions that contribute significantly to the function variability, 
according to some predefined measure. However the efficiency of this method dete- 
riorates when a large proportion of the function variability is concentrated in small 
regions of the input space. In contrast to the generalised sparse grid method, locally- 
adaptive methods, such as that of Ma and Zabaras |9|, attempt to reduce the number 
of points in a sparse grid by concentrating refinement only in rapidly varying or dis- 
continuous regions. This method is implicitly dimension-adaptive but often points 
are constructed necessarily in 'unimportant' dimensions ifTUl . 
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In this section we propose a method which combines the strengths of both local- 
adaptivity and the generalised sparse grid algorithm. We coin this approach the h- 
Adaptive Generalised Sparse Grid (/z-GSG) method. 



Generalised Sparse Grid (GSG) Algorithm 

Gerstner [6| generalised the sparse grid construction by considering the index sets 
based upon the admissibility criterion 

i-e ; G^forl<j<d,jy>l (7) 

This so-called generalised sparse grid method [6| is extremely effective at deter- 
mining the hierarchical difference spaces that contribute significantly to the function 
variability, according to some predefined measure. 

The generalised sparse grid method is a greedy algorithm which attempts to find 
the index set such that for a given number of points the approximation error is 
minimized. Starting with ,J? — {0} the index set is built iteratively by searching 
the forward neighbourhood of the current index set for new admissible indices. The 
forward neighbourhood of an index i is the set of d indices {i + e 7 : 1 < j < d}. 
Similarly, the backwards neighbourhood is just {i — e,- : 1 < j < d}. 

Once the forward neighbourhood has been identified, each forward neighbour is 
checked for admissibility using Q. The grid points associated with each admissible 
index are then evaluated and the error of these spaces calculated. The calculation of 
these errors will be addressed shortly. The forward neighbour with the largest error 
is then added to the current index set and the set of admissible indices is updated. 

To facilitate easy computation of new admissible indices we partition the index 
set into two disjoint sets 6 and si ', which Gerstner [6| refers to as the old and 
active index sets, respectively. The active index set contains all the indices in 
that have been constructed but whose forward neighbours have not been considered. 
The old index set contains all the indices remaining in the current index set . The 
algorithm proceeds by searching the forward neighbourhood of the index i £ with 
the largest error for admissible indices. All the new admissible index sets are added 
to the active index set srf and the index i is then added to the old index set ff. This 
process is repeated until a global error is below a predefined tolerance S. 

The exact error associated with each index i is unknown. Consequently each 
time an index i is deemed admissible an approximation of the error r; must be used. 
Numerous error criteria can be utilised. Here we employ the error measure 

E v, j • w 

These index-based error criteria can be used to approximate the global error. We 
propose the following global error indicator r 

r = E r i 
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When r < e the generalised sparse grid algorithm is terminated. The generalised 
sparse grid algorithm is outlined in Algorithm[T] 



Algorithm 1 Generalised Sparse Grid Approximation 

i •(' 0. 

si := {i} 

r := r\ 

while r > £ do 

select i £ srf with largest error indicator r\ 

r := r— rj 
for & := 1, ... ,d do 
j = i + 

if j — e„ e O V n = 1, . . . ,d then 
vi := .e/U{j} 
CreateGrid(j) 
r:=r + rj 
end if 
end for 
end while 



Three steps of the generalised sparse grid algorithm depicting the construction 
of the sparse grid index set are shown in Figure [T] The top row represents the cur- 
rent index sets. The bottom row depicts the corresponding sparse grid. At each step 
the forward neighbours of the grid index i with the largest error rj (striped box) 
are checked for admissibility. A forward neighbour is admissible if all indices in its 
backwards neighbourhood are in the old index set (grey boxes). All admissible in- 
dices (pointed to by an arrow) are added to the active index set (black and striped 
boxes). 

The striped box i = (1, 1) in the first step has two admissible neighbours as the 
backwards neighbourhoods of both forward neighbours are complete. In comparison 
the striped box i = (2, 1) in the second step only has one admissible index. The 
index ji = (3,1) has two backwards neighbours ji — ei = (0,2) and ji — e2 = (1,1) 
in the old index set, and thus is admissible. In contrast the index }2 = (2,2) has 
one backwards neighbour in the old index set }2 — ei = }2 and one in the active set 
}2 — e.2 = (2, 1), and so is not admissible. 

Although we wish to use the generalised sparse grid algorithm for interpolation 
the error criteria we have proposed are based upon an integral formulation. Specif- 
ically the error indicator rj measures the contribution of the index i to the global 
integral approximation. Furthermore the algorithm is terminated when the approxi- 
mated error r in the integral is below a predefined threshold e. This choice was made 
purposefully. 
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Fig. 1. Three steps of the generalised sparse grid algorithm. The top row represents the current 
index sets. Active grid indices are in black, indices in the old index set & are in grey and the ac- 
tive index with the largest error indicator is striped. The bottom row depicts the corresponding 
sparse grid. Only the points associated with indices in the old index set are shown. 



The magnitude of the hierarchical surplus, which is the size of the difference be- 
tween the true function and the sparse grid approximation at a grid point, may be 
more synonymous with interpolation. Simply adding indices with large hierarchical 
surpluses, however, is inefficient. The magnitude of the hierarchical surpluses de- 
cays slowly in regions adjacent to discontinuities. At the site of jump discontinuities 
the hierarchical surplus will be at best half of the magnitude of the jump, for any 
finite number of grid points. Thus the algorithm can proceed much further than is 
necessary. The use of the error criterion r\ provides a lower bound on the size of the 
support of the basis functions used by weighting the magnitude of the hierarchical 
surplus by the probability that an arbitrary point x will fall within its support. 

Regional Adaptivity 

Each time a grid index i is added to the active index set, the traditional generalised 
sparse grid algorithm evaluates all the points in the set B\. Such an approach is ineffi- 
cient if a large proportion of the function variability is concentrated in small regions 
of the input space. When using equidistant grids, the creation of the grid i requires 
approximately two times the number of grid points (and thus function evaluations) 
than those necessary to construct the index i — e ; . Consequently we propose intro- 
ducing a locally adaptive procedure to construct the points associated with each grid 
index. To incoporate local adaptivity into the generalised sparse grid algorithm we 




# □ • * • 

i - e 2 = (2, 1) 

Fig. 2. An example of local adaptation integrated with the generalised sparse grid algorithm. 
Assume that the function only varies significantly in the left half of the domain. The top 
left and bottom right grids are the backwards neighbours of the grid being created. Circles 
represent points in the sparse grid, squares are points in the active point sets, and crosses are 
points in the redundant point sets. The active (square) points in the backwards neighbours are 
refined to produce the set of new points that must be added. In this example only two new 
points (squares in grid i = (2,2)) are added. 



define the two sets and M\ for each grid index i. We refer to these sets respectively 
as the active point set and redundant point set of the grid index i. The active point set 
srfy contains all admissible points associated with the index i with an error indicator 
7i,j > £■ The redundant point set M\ contains all admissible points with yi> < e. A 
point is admissible if one of its d possible ancestors exists in the grids associated 
with the backwards neighbourhood of i. If, and only if, a grid point is admissible 
it is created (the function is evaluated) and the error indicator /; j calculated. This 
drastically reduces the number of points generated when a new grid index is created. 
To guide local refinement we propose using the error indicator 

Kj = krvfy| (8) 

If 7ij > £ the point Xj j is added to the active point set otherwise it is added to the 
redundant index set The procedure used to implement /t-adaptivity on each grid 
index is outlined in Algorithm[2] 

Figure[2]shows an example of /i-adaptivity integrated with the generalised sparse 
grid algorithm. Here the grid index i = (2,2) has been deemed admissible by the 
generalised sparse grid algorithm. Both the backwards neighbours (i — ei = (1,2) 
and i — ei = (2,1)) exist in the old index set 0. The active points in the backwards 
neighbours are used to determine which points in the active index i should be eval- 
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Algorithm 2 CreateGrid(i) 
for (ne{l,...,d}) do 
for (Xj_ en j 6 ^4- e „ ) do 

^=FindAxialChildren(xj_ en j ,n) 
for ( x ik e <if ) do 
if ( f l k > e ) then 

s4 := ^U{x ijk } 
else 

:=^U{x lik } 

end if 
end for 
end for 
end for 



uated. For any point in the active set of the n-th backwards neighbour n = 1 , . . . , d 
the children of that point are created in the n-th axial direction. This refinement is 
carried out for all points in the active point set of the backward neighbour and for all 
backwards neighbours. 

Efficient Termination 

The generalised sparse grid (GSG) algorithm is a greedy algorithm which efficiently 
identifies the sparse grid index set that is necessary to interpolate a function up to 
a level of predefined accuracy. The algorithm can determine the number of variable 
interactions and the individual importance of each variable [8|. This is achieved by 
successively adding the grid index with the largest error indicator to the old index set 
and searching its forward neighbourhood for admissible indices. Every admissible 
index is added (and thus created) to the active index set without regard for the error 
associated with that grid. The algorithm finally terminates when Y,ie^ r i < e - 

The decision to add all indices i, regardless of the size of their associated error 
indicator r;, typically results in the creation of a large number of grids with rj << e 
and which have little effect on the accuracy of the approximation. To reduce the 
number of these unimportant indices we propose only adding admissible indices with 
rj > £ to the active index set si ' . This significantly reduces the number of grid indices 
in the final index set J> and thus the total number of function evaluations, with only 
minor effect on the overall accuracy of the generalised sparse grid method. 

3 Error Analysis 

In this section we derive a bound on the error of the proposed n-GSG method. For 
ease of discussion let us rewrite |6]l in the following form 

fiA5) = ZMS), MS)=Z v u -^eWj (9) 

i<I jGBj 
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The proposed h-GSG algorithm terminates when all points in the sparse grid with 
an error indicator y, j > e have been considered. This truncation of the sparse grid 
space has an effect on the accuracy of the approximation. This effect is quantified by 
the following theorem. 

Theorem 1. Let f e . op t be an interpolation of u that obtains ||/ — / e , pt||g < £ with 
the least number of function evaluations. Then for any function u and a given toler- 
ance £ > and the error criterion }fj = ||vij • V{j || ?) the h-GSG approximation f £ j 
satisfies 

\\f-f e , d \\ q <e(l+N(e)) 

where N(e) is the number of points in the optimal interpolant but not in the h-GSG 
interpolant. 

Proof. Let 

/e,opt = E V 'J ' ^ J 

(U)e^e,opt 

be an interpolation of / that obtains \\f — f e , pt\\q < £■ The points in this optimal 
approximant are defined by the index set 

Pe,o V t ■= {(i,j) : i € J^, pt and j e Bi, e , op t Q #i} 
Similarly denote the h-GSG interpolant with an error indicator y,j by 

hA= E VySij 

where the point indices in the h-GSG approximant are 

P e ,d ■= {(U) : i e J^, j e Bi and |vy| > e} 

Now denote p common (p £ d p|P £ opt ) the set of indices common to both the optimal 
and h-GSG approximants and denote P uni i ue := {(P e ,d\JPe,opt) \P coraraon ) the set of 
indices that exist only 

We can split p um i ue further into "p 1 " 6 and f"™ que which are points unique to the 
optimal approximant and the h-GSG approximants respectively. Using this splitting 
and the linearity of the hierarchical interpolants / e , pt and f e j, yields 



where 



and 



r _ /.unique . /-common ~ nr \ r /.unique . /common 

Je.opt — / e , pt + J ana j ed — j £d + j 

/.unique _ ,,..111. /-unique _ ,,..111. 

Jeppt — ^ij *ij j Je.d — 2-i Vl 3 



(j j^pcommon 
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Using these definitions we can write 

\\f-fe,d\\q = ||/-/e,opt + /e,opt-/s,</|| ? 

11^ r | / /-unique . /-common \ //-unique . />common\u 

— 11/ -Je, opt + \J £ . opt + J )~Ue,d + J )\k 

<ii/-(/,o Pt +/ e T ue )ii,+ii/rop q t ue ii, d°) 

Assuming that the adaptivity of the /i-GSG method works perfectly, that is 



unique 



then 



ii/grii, = 



< 



1 e,opt 

<#(^pT)- £ 



(ii) 



(12) 



By definition of the optimal interpolant 

ll/-(/ £ ,op t +/"7 ue )ll,<£ 

Setting A^(e) = #(^™pt Ue ) we arrive at the assertion. □ 

Theorem [T] states that the accuracy of the /i-GSG interpolant is dependent on the 
number of points with cumulative y, j < £ that are not in the approximation but have 
7,j « e. The exact number A^(e) of these points is dependent on the smoothness of 
the function being approximated. The smoother the function, that is the faster the 
hierarchical coefficients decay, the smaller A^(e) will be. 



4 Numerical Study 

In this section we investigate the performance of the proposed /i-GSG method when 
applied to a number of numerical examples. We analyze convergence, with respect 
to the order of the local polynomial basis and the dimensionality for functions of 
varying smoothness. First we consider a set of two-dimensional functions which 
visually illustrates the effect of the choice of basis degree and the performance of 
local adaptivity. We then discuss the performance of different basis functions when 
applied to functions of differing regularity. The effect of the termination condition 



presented in Section 2.4 is also presented. Finally the utility of the proposed method 
is shown for high-dimensional approximation with hundreds of variables. 
In the following we will consider the following four functions: 
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*« )= |03-ff-g|-HU ' ^ [ °^ (13) 
/f(€)=exp^-£ cftfi-wt) 2 ), ^G[0,l] d (14) 

#($)=«p^-£ c,^- Wi |], ^[O.lf (15) 

/*(«) = <f° n if ^> Wl0r ^> W2 ) § «=[„,!]' (16) 

I ex P (L;=i c 'biJ otherwise 

Unless otherwise stated, the coefficients w, = 0.5, z' = 1 , . . . ,d. The choice of c, deter- 
mines the effective dimensionality of the function and is defined differently for each 
problem. Although smooth, the mixed derivatives of the Gaussian function f£ can 
become large and thus degrade performance if not compensated for by appropriate 
adaptivity. The discontinuities in functions f£ and ff also degrade, with increasing 
magnitude, the efficiency of isotropic methods and subsequently can highlight the 
strengths and weaknesses of any interpolation method. 

In the following we will analyze convergence with respect to the following mea- 
sures: 

e r = max 

i=l,...JV 




£fi = 

where / and g are the true function and approximation respectively. In all the follow- 
ing examples N = 1000. Error in the quadrature rule / a pprox is also considered and 
measured by 



^integral 



^approx ^e: 



'exact 

where 4xact is the exact integral. Unless otherwise stated this value is calculated 
analytically. 



4.1 A two-dimensional example 

Let us first consider two low dimensional functions, ( fT3j ) and ( p"5| ), defined on the unit 
hypercube [0, l] 2 . Figure [3] depicts the grids generated when the proposed method is 
applied to the piecewise continuous function ( fT5| using linear basis functions (a) and 
quadratic basis functions (b). This function has discontinuities in its first derivatives 
along ^1 = 0.5 and ^2 = 0.5. The linear h-GSG grid concentrates grid points around 
the rapidly varying region associated with the discontinuous change in the derivative 
information. In comparison the quadratic basis requires significantly less function 
evaluations. The quadratic basis is able to obtain second order convergence in each 
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of the four smooth quadrants whilst still approximating well at the discontinuities. 
Here c,- = 10/2 !+2 , and the absolute error criterion <[8j is used with e = 10~ 6 . 

The function ( fL5] l possesses discontinuities that lie along the axial directions. Let 
us now consider function ( fT3| l which possesses a singularity that passes through both 
axial directions. Figure[4]depicts the grids obtained using a tolerance of e = 10~ 6 and 
linear (a) and quadratic basis functions (b). The linear /z-GSG grid concentrates grid 
points around the rapidly varying region associated with the discontinuous change 
in the derivative information. Unlike the previous example it is now unclear whether 
the use of the quadratic basis function results in increased efficiency. The accuracy 
of the linear basis functions is higher (Ef. = 3.19 • 10~ 3 ) than when the quadratic 
basis is used {e t i — 1.15 • 10~ 2 ) but the linear method requires almost three times as 
many points. The effect of varying the degree of the local basis is discussed in the 
following section. 



0.8 



iTB 0.8 



0.2 



tu as as To 



(a) p = 1, (2477 points, £p = 1.18 • 1(T 4 ) (b) p = 2, (1257 points, Bp = 4.67 • 1(T 5 ) 



Fig. 3. The adaptive grids obtained using basis of varying degree. Here the error criterion ( 
is used with e = 10 . 



4.2 Increasing the Degree of the Local Polynomial Basis 

Let us now consider some moderate-dimensional integrals (d = 10) and discuss the 
effect of the degree of the local polynomial basis on the efficiency of the proposed 
method. Setting c; = l/2 !+2 , Figure [5] compares the rates of convergence with re- 
spect to the tolerance e when the proposed method is applied to the three test func- 
tions (fl4]i-([T6|. Figure [5] (a) illustrates convergence with respect to the eg- measure 
for the smooth function /2 =1 °. In this case the effect of the higher-degree basis is 
clearly evident. The quadratic basis provides drastic improvement over the standard 
piecewise-linear basis and the quartic basis provides a further increase in efficiency. 

This result is mirrored when /z-GSG is applied to the piecewise-continuous func- 
tion f*= w (Figure||(b)). However when a jump discontinuity is present (func- 
tion (15) /"°) the perfor mance of the quartic basis functions is reduced (Fig- 
me|5J(c)). 
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(a) p = 1,(9127 points, Bp =3.19- 10~ 3 ) (b) p = 2,(N = 3980, £p = 1.15 • 1CT 2 ) 

Fig. 4. The adaptive grids obtained using basis of varying degree. Here the error criterion ^ 
is used with e = 10 . 

The quadratic basis significantly increases the accuracy of the /z-GSG method 
for smooth and discontinuous functions. Higher order basis functions p > 2 pro- 
vide further increases in the rate of convergence obtained for smooth problems, but 
performance is degraded for discontinuous problems. These results are reproduced 
when higher-dimensional realizations of these functions are considered. 



4.3 Efficient Termination of /z-GSG 

In Section |2.4| we proposed that the efficiency of the generalised sparse grid algo- 
rithm can be improved by only adding admissible indices with r\ > e to the active 
index set. Here we substantiate that claim. The h-GSG method discussed here and 
throughout this manuscript implements this modification. 

Again consider the three test functions (p~4]>-(|T6]>. But now let us investigate per- 
formance when d = 100 and 

Ci = Aexp(— i=l,...,d 

where the parameter A which controls the effective dimensionality of the function. 
Figures [6] illustrates the difference between the proposed method with and without 



the modification proposed in Section 2.4 Specifically the figures depict the error 



in the sparse grid interpolant as the algorithm evolves. The modification results in 
substantial improvement when applied to the smooth (not depicted) and piecewise- 
continuous (Figure [6] (a)) functions. The unmodified algorithm adds many points 
corresponding to grid indices with r; < e and which contribute to the interpolation 
error. The modification limits the number of unimportant points. 

When applied to the discontinuous function (Figure [6] (b)) the modification re- 
sults in lower accuracy than when the unmodified algorithm is used. The unmodified 
algorithm continues to add points belonging to grids with < e and which contribute 
little to the integral of the function yet still significantly influence the accuracy of the 
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icH io" in 4 

Number of Grid Points 
(c) ep error in fa 



Fig. 5. Error in the interpolants of fa, fa, and fa for p 6 {1,2,4}, e = 10 8 and = 10. 



interpolant. The points belonging to r\ < e mainly reside around the discontinuity. 
As the level of refinement increases, the contribution of these points to the integral 
decreases yet their effect on the interpolant may not. This effect is illustrated in Fig- 
ure |6](c) which depicts the decrease in the error of the integral approximation with 
and without the modification when /t-GSG is applied to the discontinuous function. 
Here it is clear that adding grids with r; < e has little effect on the accuracy of the in- 
tegral approximation. Also note that the effect of the termination condition decreases 
when the dimensionality d is small. 



4.4 High-Dimensional Interpolation 



In Section 4.3 we used /z-GSG for interpolation for a set of 100-dimensional prob- 
lems. In this section we show that the proposed method can be applied to much 
higher-dimensional problems. Again consider the discontinuous function ( |T6] > which 
is the most difficult function to approximate of the three test functions used thus far. 
Again let 
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io° in 1 u? io 5 in 5 in 6 

Number of Grid Points 



(c) integral error in ff 100 

Fig. 6. The evolution of the error in the interpolant when applied to the test functions with and 
without the proposed termination condition. Here d = 100 and e = 10~ 6 . Initially the results 
are visually identical. Differences only occur towards the end of the algorithm. 



c ,. = Aexp(-^) (17) 

Table[T]shows the number of function evaluations required to approximate fa and 
the resulting relative error in the approximated integral when e = 10~ 4 , quadratic 
basis functions are used, and X = 1 . An analytical expression for the integrand can 
be obtained easily due to the exponential nature of the function. Arbitrary precision 
arithmetic was used to evaluate the numerical value of the reference integrals. Due 
to the large range of values that fa can take in high dimensions we use the relative 
error indicators 









w 0,0 • Vo,0 




w o,o ■ v 0,0 



to respectively guide difference space selection and local refinement. The /i-GSG 
algorithm is terminated when 
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wo.o • v 0,0 

where r is the global error indicator used in Algorithm[T] An error of the order 10~ 2 
is achieved for up to 700 dimensions using less than 300,000 function evaluations. 

The accuracy of the integral approximation decays with increasing dimensional- 
ity. This is likely caused by the particular error indicators (y, j and r\) used to guide 
adaptivity. At the moment a point ^ j j is refined if y, j > e and a grid index i is flagged 
for refinement only if rj > e. This approach works well when d < 400 but could be 
improved upon when the dimensionality is higher. By excluding points if they have 
"fi j less than the desired accuracy e we are potentially ignoring a significant num- 
ber of points whose combined contribution to the integral is greater than e. As the 
dimensionality increases more and more points will be excluded from consideration 
thereby causing the accuracy of the approximant to decrease. This remark is consis- 
tent with Theorem [T] which states that the accuracy of the /i-GSG approximation is 
dependent on the number of points with yi s close to e. As the number of these points 
increases the accuracy of the approximation decreases. 

The decrease in accuracy depicted in Table [T] could be addressed by utilising 
more appropriate error criteria than those used here. The construction of efficient and 
robust error indicators is problem dependent and must be based upon the properties 
of the function under consideration. If no information on the function is available, 
we have shown that the error indicators used here will still perform well. 



Table 1. Errors in the h-GSG approximation of {T6]» for d = 100 to 700. e = 10~ 5 



d 


N 


^-integral 


100 


3,376 


3.81 


10~ 


4 


200 


12,488 


1.67 


10~ 


3 


300 


31,533 


1.71 


io- 


4 


400 


62,404 


8.44 


io- 


5 


500 


109,356 


4.57 


io- 


3 


600 


176,842 


7.97 


io- 


3 


700 


269,665 


1.68 


10" 


2 



Here we note that the rate of convergence of the /i-GSG method is governed by 
the implicit weighting of the importance of each dimension. In this case the impor- 
tance is controlled by the coefficients c,. To illustrate this dependence, Table|2]shows 
the efficiency of /i-GSG as the magnitude of the coefficients is increased. As the "im- 
portance" of each dimension increases the number of sub-dimensional components 
increases. In this case an increase in c,- also increases the function variability which 
also requires additional points to achieve a set level of accuracy. 



20 



J.D Jakeman and S.G. Roberts 



Table 2. Errors in the h-GSG approximation of {16} for increasing dimension importance 
e = 10~ 6 . Importance is increased by increasing X in Equation {T7J. 



A 


N 


^integral 


1 


9,226 


1.66-10" 


-4 


2.5 


34,977 


2.96 • 10" 


-5 


5 


175,201 


6.53-10" 


-4 


7.5 


659,368 


1.93-10" 


-3 



5 Conclusion 

This paper presented an /Vadaptive generalised sparse grid (/i-GSG) method for in- 
terpolating high-dimensional functions with discontinuities. The proposed algorithm 
extends and improves upon existing approaches by combining the strengths of the 
generalised sparse grid algorithm and hierarchical surplus-guided /z-adaptivity. 

The underlying generalised sparse grid algorithm greedily selects the subspaces 
that contribute most to the variability of a function. The hierarchical surplus of the 
points within each subspace is used as an error criterion for /z-refinement with the 
aim of concentrating computational effort within rapidly varying or discontinuous 
regions. This approach limits the number of points that are invested in 'unimportant' 
subspaces and regions within the high-dimensional domain. 

A high-degree basis is used to obtain a high-order method that, given sufficient 
smoothness, performs significantly better than the traditional piecewise-linear basis. 
When discontinuities are present in the function surface or its derivatives, perfor- 
mance deteriorates. However, it was shown numerically that even in such situations 
the quadratic basis will still result in higher-rates of convergence than that achieved 
by using piecewise-linear interpolation. 

Often the importance of function variables are governed by natural yet unknown 
weights. In these cases, the proposed method can utilise this implicit weighting to 
determine and restrict effort to the effective dimension of the model. This property 
allows the /z-GSG method to be applied to non-smooth functions with hundreds of 
variables. 
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